Sentiment Analysis - Impact of Elon Musk's Tweets on BTC price

In this mini-study, we look into the impact of Elon Musk's tweets on BTC since he is one of the biggest Crypto-influencer on Twitter.

Data scraping - Snscrape vs Tweepy

We used the snscrape library instead of Tweepy because of the recent update of twitter Developer API terms that limits the number of tweets to be scraped. More details on how to use snscrape are included in the reference link below.

Data Cleaning

After collecting the tweets, we keep the only necessary data such as timestamp, full text of the tweet and retweet count as a virality indicator.

Before we process the tweets, we should also remove any mentions (handles), and hyperlinks that could confuse our NLP model. Keywords are kept as they may contain useful influencing keywords.

We also filter the actual content with crypto-related keywords. Unfortunately, keyword like 'moon' was left out because it often picked up news related to SpaceX instead of terms similar to 'to the moon' in the crypto-space.

Sentiment analysis with TextBlob and Google Cloud Natural Language API

Using TextBlob, sentiment score (TB_sscore) and subjectivity are generated.

TextBlob scoring

Google Cloud Natural Language API

To use Google cloud natural language API, you will need to create a Google Cloud account, create a new project and copy the json path containing the keys. Next, enable the Cloud Natural Language API for the specific project (requires billing account, but comes with $300 for free in 90 days in the trial account).

Scoring

How does Elon Musk's tweets influence the price?

We import free XBTUSD minute data downloaded from the link http://api.bitcoincharts.com/v1/csv/ and compare against the sentiments. Here we have also included an additional column of manual assessment of the tweets. We simply label each tweet +1 if it is deemed to have positive impact on Bitcoin or -1 if it has negative impact on Bitcoin.

The following returns for 3 trading duration (30-min, 1 and 2-hr) are also computed. Transaction costs are not factored in.

It is noticed that prior to 2021, Elon Musk tweeted way less about cryptocurrency in comparison to 2021. Before 2021, we see that there are more data points below 0, meaning the price went in the opposite direction of the sentiment that the tweet implies and that actually skewed the overall win-rate. Based on tweet data from 2021 onwards (where Elon musk tend to be more vocal and active in tweeting about cryptos), we would in fact have a >60% chance of winning within 1-2 hours based on his tweets.

Interim conclusion

Although the downside of Cloud Natural Language API is that it is a paid service, it does outperform textblob. It was able to read emoji well but there were still many tweets (including sarcasm) that were incorrectly interpreted. This could also be because the context of the replies were not put in place.

We also manually computed the 30-min, 1-hr and 2 hr PnL of BTC immediately after the tweet broke out, and there was no correlation between the GCL sentiment score/ manual assessment of the sentiment and the actual PnL. To use this as an entry/exit signal, we would need further improvement to the prediction model in reading tweets and also the way we interpret the whole chain of conversation. However, because they don't happen very often, we could in fact do this discretionarily.

Looking at his retweet counts, it appears that tweets that he made in 2021 are more viral and could potentially 'influence' the market more. We also filtered 2021 data alone, and found higher win-rates (more than 60%) if we were to trade based on his tweets with duration between 30mins to 2 hours.

Reference